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Abstract 

Comparative genomics has become widely accepted as the major framework for the ascertainment of functionally important 
regions in genomes. The underlying paradigm of this approach is that most of the functional regions are assumed to be 
under selective constraint, which in turn reduces the rate of evolution relative to neutrality. This assumption allows detection 
of functional regions through sequence conservation. However, constraint does not always lead to sequence conservation. 
When purifying selection is weak and mutation is biased, constrained regions can even evolve faster than neutral sequences 
and thus can appear to be under positive selection. Moreover, conservation estimates depend also on the orientation of 
selection relative to mutational biases and can vary over time. In the light of recent data of the ubiquity of mutational biases 
and weak selective forces, these effects should reduce the power of conservation analyses to define functional regions using 
comparative genomics data. We argue that the estimation of true mutational biases and the use of explicit evolutionary 
models are essential to improve methods inferring the action of natural selection and functionality in genome sequences. 
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Introduction 

Identifying functionally important regions of genomes is a key 
challenge in evolutionary biology. Fueled by the availability of 
whole-genome sequence data for a constantly growing num- 
ber of species, comparative genomics has emerged as the 
standard framework for the identification of functional 
regions (Hardison 2003; Pheasant and Mattick 2007). 

The approach taken by comparative genomics is based on 
the assumption that mutations in functional regions would 
often be deleterious and thus filtered out by purifying selec- 
tion, reducing the rate of evolution in functional regions rel- 
ative to nonfunctional neutrally evolving regions. This 
signature is also commonly referred to as sequence conser- 
vation, which in the comparative genomics context is de- 
tected as regions of reduced divergence compared with 
neutrally evolving regions in sequence alignments. The more 
critical a functional region is, the greater the purifying selec- 
tion to maintain it and the greater the signature of sequence 
conservation we expect to see. Most current comparative 



genomics approaches to identify and classify functional re- 
gions are built on this paradigm (Waterston et al. 2002; 
Cooper et al. 2005; Siepel et al. 2005; Margulies et al. 
2007; Pheasant and Mattick 2007; Eory et al. 201 0; Goode 
et al. 2010; Pollard et al. 2010). 

The notion that functionality entails sequence conserva- 
tion is rooted in Kimura's influential concept of the "neutral 
theory of molecular evolution" (Kimura 1983). Neutral the- 
ory surmises that positive selection is so infrequent that its 
contribution to the rate of evolution is negligible. The 
majority of sites in a genome are assumed to evolve neu- 
trally, whereas some fraction, f c , of functional sites are under 
strong selective constraint. If we define the rate of evolu- 
tion, r, in terms of the rate at which new mutations fix in 
the population, then in functional regions r = (1 - f c ) r 0 , 
where r 0 is the rate of evolution in the neutrally evolving re- 
gions. The common conception that increasing constraint 
can only decrease the rate of evolution, r/r 0 < 1, emerges 
as an immediate consequence. 
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An accepted shortcoming of this approach is that some or 
even many functional elements are affected by nonequilib- 
rium processes such as positive selection, nonstationary mu- 
tation rates, and roving hot spots of biased gene conversion 
(BGC) (Pollard et al. 2006a; Galtier and Duret 2007; 
Pheasant and Mattick 2007; Berglund et al. 2009; Duret 

2009) . However, it is still a wide-held assumption that, in 
an equilibrium scenario, constraint necessarily entails 
sequence conservation. This notion is not justified — it has 
been shown more than a decade ago that the interplay 
of mutational biases and weak constraint can be quite com- 
plex; the rate of evolution at constrained sites can even be 
higher than that at neutral sites (McVean and Charlesworth 
1999). 

A bulk of recent evidence points to the ubiquity of 
the necessary ingredients — mutational biases and weak 
selective forces — for this effect to occur. Weak purifying 
selection appears to be acting in substantial parts of the 
genomes of many species (Bustamante et al. 2002, 
2005; Ohta 2002; Lu and Wu 2005; Comeron 2006; 
Lipatov et al. 2006; Eyre-Walker and Keightley 2007; Eory 
et al. 2010). Furthermore, the process of BGC, which intro- 
duces fixation biases in a way similar to that of weak puri- 
fying selection (Nagylaki 1 983), operates in most eukaryotic 
genomes and is the best candidate to explain observed 
genome-wide systematic biases in fixation probabilities 
(Galtier and Duret 2007; Duret and Arndt 2008; Duret 
and Galtier 2009). 

Mutational biases have also been observed in many 
organisms, and weak selection-like forces often seem to 
be acting in opposition to the mutational biases, as indicated 
by the observation that genomic nucleotide contents are 
typically less biased than would be expected from the 
underlying mutational biases (Lynch et al. 2008; Hershberg 
and Petrov 2010; Hildebrand et al. 2010; Ossowski et al. 

2010) . The complexity of how sequences evolve under 
realistic scenarios of weak constraint and mutational biases 
has important ramifications for conservation analysis and its 
evolutionary interpretation. 

Using a generalized Markov process to emulate the 
mutation-selection dynamics governing a genomic site, 
we investigate how complex interactions between weak 
selection and mutational biases affect different measure- 
ments of conservation under different scenarios. We first 
recapitulated the results of McVean and Charlesworth 
(1999) that weak constraint can accelerate the rate of 
evolution over that at neutral sites. 

We demonstrate that this effect complicates the infer- 
ence of constraint in the maximum-likelihood (ML) 
branch-length analysis that underlies many comparative 
genomics approaches such as GERP (Cooper et al. 2005) 
and PhyloP (Pollard et al. 2010). As a practical example, 
we investigate how GERP conservation scores are affected 
over a realistic (mammalian) species tree. We find that 



constrained sequence regions do not always show the 
signatures of sequence conservation. Furthermore, for 
the same strength of weak constraint, the measurement 
of conservation will typically vary with branch length 
and depend on the orientation of selection, that is, which 
particular bases are the preferred states in relation to the 
mutational biases. The power of ML methods to detect 
functional regions may thus be substantially reduced in 
regions of weak constraint. We discuss how inference 
methods that disentangle mutation and selection can 
improve such analyses. 

Materials and Methods 

Markov Models of Sequence Evolution 

The evolution of DNA sequences can be modeled as 
a Markov process specified by a substitution rate matrix 
R (Lio and Goldman 1 998). Its elements, R,y, denote the rates 
at which a nucleotide /' is substituted by a nucleotide j; 
diagonal elements are R,-, = -Yj^ip t ne total rate away 
from /. 

If all sites in a sequence are assumed to evolve according 
to the same model and independently of each other, we can 
specify the equilibrium nucleotide composition, p = 
{pa.Pc.Pg.Pj}, as the eigenvector corresponding to the larg- 
est eigenvalue of R, which is guaranteed to be zero: 

P R = 0. (1) 

The stationary rate of evolution, that is, the average rate 
of substitution per site in equilibrium, is then: 

( 2 ) 

The probability P,y(f) of observing nucleotide j after time t 
if the ancestral state at that site was /' can be calculated from 
the matrix exponential: 

P(f, R) = e Rt . (3) 

Notice that P(f,R) allows for the possibility of any number 
of substitutions having occurred during the time interval f. 
Diagonal elements P,-,(f,R) specify the probabilities of observ- 
ing no change. When two sequences separated by time fare 
compared, their expected observed divergence d(t), that is, 
the fraction of sites at which the sequences differ, can be 
calculated as the total probability of observing different base 
pairs at homologous sites: 

d(f)=$>[i-/Kf,R)]- (4) 

To disentangle the processes of mutation and selection, 
we first define a mutation rate matrix Q with its elements 
specifying the rates at which new mutations /' to j occur in 
individuals. Substitution rates can then be decomposed into 
the product of mutation rates (i,y, the effective (haploid) 
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population size N and the probability of such mutations 
eventually fixing in the population, v,/. 

R/j = Hij x N x Vjj. (5) 

In this framework, specific biases in the raw rates at 
which new mutations occur can be incorporated into the 
mutation rates n,y. The fixation probabilities v,y can account 
for selective advantages or disadvantages of a nucleotide j 
over a different nucleotide /. If we specify with s(i,f) the se- 
lection coefficient of nucleotide j relative to /', the fixation 
probabilities are approximately v,y = 2s(/j)/[1-e~ Y(u) ] 
(Kimura 1983). Note that R,y then only depends on \i ri 
and the amount of effective selection, y(i,j) = 2Ns(i,j). 
The particular scenario of purifying selection at a given site 
is specified in terms of the y(i,j) for all ij e {A,C,G,T}. For ex- 
ample, if nucleotide a is the preferred nucleotide at that site 
and it is preferred at an equal strength y func over all other 
nucleotides, whereas mutations between each two unpre- 
ferred nucleotides are selectively neutral, then: y(i,a) = yf Unc 
= -y(a,i) for all / jt a and y(i,j) = 0 for all ij a. 

We model BGC analogously to purifying selection prefer- 
ring C/G alleles over A/T alleles at specific strength y BGC . 
Mutations C <-> G and A <-> Tare not affected. In the sce- 
narios where both BGC and purifying selection are acting, 
the resulting effective selection is the sum: y(ij) = y B Gc('j) + 

VfuncvV)- 

Under neutrality and in the absence of BGC (y = 0), we 
have v,y = 1//V and thus R = Q. The mutational equilibrium 
composition, n, is then determined by nQ = 0. The total rate 
of evolution yields r = J^f^i^j^iV-ij- and the expected diver- 
gence between two neutrally evolving sequences separated 
by time t is given by d(f) = £>,{1-P,7(f,Q)]. 

We always assume reverse complement symmetry (\i AJ = 
Uta and u C g = hgc)- Under this assumption, it is convenient 
to describe the raw mutational bias in terms of the equilib- 
rium A/T content in the absence of BGC and selection, 7t A+T 
= n A +n T . Note that detailed balance yields n A+J = (hc/g-> 
aaMha/wc/g + Hc/g-a/t)- A mutational bias of 7t A+T = 0.8, 
for example, implies that mutations from C or G to A or T 
occur four times more often than the reverse process. 

The scale in which time is measured can be chosen freely 
in the above framework. By convention, R is normalized 
such that time is measured in units of expected number 
of substitutions at neutrally evolving sites, that is, it is deter- 
mined by the condition XXX^i 1 ') = ^ -2 



ML Branch-Length Inference 

Let D be the observed pair-count matrix in a two-sequence 
alignment of length n, that is, elements D,y denote at how 
many of the n positions in the alignment nucleotide /' is 
observed in the first sequence while the second sequence 
has nucleotide / The total observed divergence between 



two sequences, d, is the sum of the nondiagonal elements 
of D. Equation (3) specifies a probabilistic model for D given 
a rate matrix R and a divergence time t. For an empirical 
observation D and assuming an inference model M, we 
can then define a likelihood function for the divergence time 
f assuming that the sequences have evolved under M and 
that nucleotide content is in equilibrium: 

L(f) = Pr[D|M,f] (6) 

This is the multinomial probability to observe, in n trials, 
counts D,y under the normalization condition YliPii = n an d 
given individual probabilities p,y(f,M) = p,- P,y(f,M) per trial. 
Therefore: 

log L(t) = const + ^ D f log Pf (t, M). (7) 
ij 

For an inference model M and an observed count matrix 
D, the ML estimate t* is obtained by maximizing log L(t) with 
respect to t. The constant does not depend on t and can be 
omitted from the maximization. Note that we implicitly as- 
sumed a reversible mutation model here. The above frame- 
work can be extended to include nonreversible models by 
using the definition p,y{f,M) = J^kPk Pfa{?"/2,M)P^f/2,M). 

To ascertain whether a test region is evolutionarily con- 
served, one typically compares the ML branch-length esti- 
mate t* of this test region with an estimate inferred 
from a neutrally evolving reference region. Conservation 
then corresponds to f7fo<1, that is, a reduction in the 
branch-length inferred from the test region compared with 
the reference region. Note that one has to specify the par- 
ticular inference model M used in equation (7). 

In figures 2 and 3, we investigate how the choice of M 
affects conservation estimates c when the true substitution 
model for the test sequence is R and the true substitution 
model for the reference sequence is R 0 . To obtain the 
expected ML branch-length estimates for given R and R 0 , 
we first calculate the expected count matrices <D> and 
<D 0 > under these models: 

<D> ? = nxpjf(f,R) and <Do>ff = n xpj(f,R 0 ). (8) 

For the scenario of figure 3 where purifying selection in 
the test sequence is randomly oriented, there are four dif- 
ferent R models, one for each of the four possible preferred 
states. Because all bases have equal probability of being the 
preferred base at a given site, all four R models contribute 
equally to the count matrix D. 

Given <D> and <D 0 >, ML estimates for t* and to 
are then calculated from equation (7) using the specific 
inference model M. Note that these calculations do not 
depend on the number of sites in the alignment; for conve- 
nience, we chose n = 1 . Likelihoods were maximized 
numerically in Java. 
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Running GERP on Simulated Alignments Over 
a Tree 

We simulate the evolution of a site over a species tree using 
a Markov model of nucleotide substitution specified by a rate 
matrix R. For each site, we first draw its state at the root of the 
tree from the equilibrium frequencies p of the rate matrix and 
then move down the tree drawing the states at each node. 
The probability of being in state / at node k where the ancestral 
node k-} was assigned base j is given by equation (3), with 
the time t specifying the true neutral branch length between 
the two nodes in the tree. The states at the leaves of the tree 
then specify the multiple alignment column at that site. 

We used the tree from the 44-way MULTIZ alignments 
from the UCSC Genome Browser (http://hgdownload. 
cse.ucsc.edu/goldenPath/hg18/phastCons44way/placental- 
Mammals.mod), restricted to its 32 eutherian mammalian 
members. The total number of expected neutral substitu- 
tions per site for the restricted tree is 4.74. The mutation 
model is an HKY85 model with mutational bias rc A+T = 
0.8 and a transition/transversion ratio of 4.0. 

In our evolutionary scenario, every site in the test 
sequence is modeled to be under purifying selection of 
strength y func with the preferred base being randomly 
chosen. BGC is acting uniformly over the sequence on 
top of purifying selection, favoring C/G over A/T alleles 
at strength y B Gc- Hence, there are again four different 
substitution models, one for each preferred nucleotide. 
Multiple sequence alignments were generated by concat- 
enating the resulting alignment columns of 10 5 individu- 
ally simulated sites. The simulated sequence alignments 
together with the correct neutral tree were fed to GERP 
2.1 (http://mendel.stanford.edu/SidowLab/downloads/ 
gerp/index.html), using its default parameters. 

Results 

Faster than Neutral Evolution of Constrained 
Sequences 

The basic result that sequences evolving under weak con- 
straint and biased mutation can evolve faster than neutral 
sequences was first observed by McVean and Charlesworth 
(1999) in their analysis of the rate of evolution at synony- 
mous positions in the presence of mutational biases and 
selection on codon usage. 

In figure 1 , we recapitulate their result in a general equi- 
librium scenario where mutations are biased toward A/T, 
that is, the equilibrium A/T content in the absence of selec- 
tion, 7t A+T , is larger than 0.5, whereas weak purifying 
selection favors C/G alleles over A/T alleles at an effective 
strength y = 2Ns (the selection coefficient s rescaled by 
twice the haploid population size A/). Mutations C <-> G 
and A <-> Tare selectively neutral. Figure 14 shows the ratio 
of the average rate of substitution per site in equilibrium at 



constrained sites, r, and at neutral sites, r 0 , in this scenario 
(Materials and Methods). When purifying selection is weak, 
the rate of evolution at constrained sites is indeed higher 
than that at the neutral sites (r/r 0 > 1 ). This acceleration be- 
comes stronger as the mutational bias toward A/T becomes 
larger. For instance, when n A+J = 0.8, the maximal rate of 
evolution at constrained sites is —8% higher than that at the 
neutral sites. 

The particular value of the effective strength of selec- 
tion at which the rate of evolution is at its maximum 
increases as mutations become more biased. When the 
mutational bias is ti a+t = 0.7, for example, the maximal 

rate is observed when y 0.6, whereas for 7i a+ t = 0.8, 

the maximum is at y 1 (fig. 14). As expected, when 

purifying selection becomes strong enough, the rate of 
evolution decreases again below the neutral rate, and 
for sufficiently strong purifying selection (e.g., y < 
-1.5 for 7t A+T = 0.8), one always observes conservation 
(r/r 0 <1). 

These results provide a clear illustration that even in fully 
equilibrium scenarios of constraint where selectively favored 
states do not change in time and all rates are stationary, 
constraint does not necessarily result in conservation. 
Intuitively, this rate acceleration can be explained by the 
weak purifying selection effectively lowering biases in the 
equilibrium nucleotide frequencies. This can be seen in 
figure 16 where the expected equilibrium A/T content at 
the constrained sites, p A+J , is plotted for the strength of 
purifying selection yielding the maximal rate acceleration 
for a given mutational bias n A+T . Common mutations drive 
substitutions away from the fitter states despite purifying 
selection, whereas selection favors fixation of uncommon 
mutations resulting in faster back substitutions to the fitter 
states. This allows for greater overall flux between states and 
thus a higher rate of substitution at the constrained sites 
compared with the neutrally evolving sites. 

Rates of evolution are typically not measured directly. 
Instead, one observes sequence divergence. Figure 1C 
shows that the average expected divergence at constrained 
sites, d, can also be systematically higher than that at the 
neutral sites, do, and that this effect increases with diver- 
gence time. Similarly to figure 1A, the effect is again more 
pronounced when mutations are strongly biased away from 
the selectively preferred states. 

The paradigm that constraint necessarily entails reduc- 
tion in the rate of evolution does not hold in this scenario. 
In fact, given that the expected divergence at the 
constrained sites can be higher than that at the neutral 
sites, such sites might even be inferred to evolve under 
positive selection. 

Effects on ML Branch-Length Inference 

Today's toolbox of comparative genomics has expanded 
greatly from simply using the observed divergence in 
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Fig. 1. — Accelerated rate of evolution when selection is counteracting a mutational bias. In the shown scenario, purifying selection favors C/G 
bases over A/T bases. Mutations occur according to a standard HKY85 model specified by the mutational bias, 7i A+T and a transition/transversion ratio of 
four. Mutations G ** C and A *+ Tare neutral and mutational biases are symmetric (it A = kj, k c = n G ). (A) Ratio of equilibrium substitution rate, r, over 
the equilibrium neutral rate, r 0 , calculated according to equation (2). Values of r/r 0 < 1 indicate sequence conservation; values r/r 0 > 1 indicate rate 
acceleration. Rate acceleration is observed if weak purifying selection counteracts mutationally preferred states. (6) Comparison of mutational bias and 
actual composition bias for the selection coefficients that yield maximal rate acceleration for the respective mutational bias. Weak purifying selection 
counteracting the mutational biases effectively lowers the resulting composition bias compared with that expected if no selection were acting. Gray 
areas denote the respective regions where |Ap A+T | < |Atc a +t|- (O Increase of divergence, d, over neutral divergence, do, for the selection coefficients 
that yield maximal rate acceleration for the respective mutational biases. Time is measured in units of the expected number of neutral substitutions per 
site. 



pairwise alignments for the inference of conservation. 
Modern approaches attempt to infer from a sequence 
alignment the true number of substitutions that have oc- 
curred since the divergence of the sequences (Felsenstein 
2004). This can be achieved in a full ML framework by as- 
suming a probabilistic model of the substitution process 
and correcting for "multiple hits" at the same site. The 
inferred total count of substitutions reflects the evolution- 
ary time (branch-length) separating the two sequences. 
For conservation analysis, the ML branch-length esti- 
mates, f, of a test sequence region is compared with that 
of a presumably neutrally evolving reference sequence re- 
gion, to- The ratio of the two branch-lengths, t*/t Q , di- 
rectly relates to the ratio r/r 0 of the substitution rates in 
the test region and in the reference region. Sequence con- 
servation hence corresponds to f7f 0 <1, whereas f7f 0 >1 
indicates rate acceleration. 

ML branch-length inference requires the underlying 
neutral substitution model to be specified (Materials and 
Methods). Because the "true" neutral substitution model 
is generally unknown, one typically makes simplifying 
assumptions. In the simplest, the Jukes and Cantor 
(1969) (JC69) model, substitutions between different 
nucleotides are assumed to occur all at the same rate. 
The HKY85 model (Hasegawa et al. 1985) additionally 
allows for different equilibrium nucleotide frequencies 
and transition/transversion ratios. 

The equilibrium nucleotide frequencies of the HKY85 
model can be chosen in several ways: The HKY85:rc model 
uses the nucleotide content inferred from a strictly neutrally 



evolving reference sequence for both the test and the ref- 
erence sequence. In the HKY85:p model, nucleotide fre- 
quencies are the actual nucleotide contents of the two 
sequence regions and can thus differ between the test 
and the reference region. 

In figure 2, we consider the performance of ML branch- 
length inference for the three inference models JC69, 
HKY85:tc, and HKY85:p in an evolutionary scenario where 
mutation is uniformly biased in favor of A/T (n AJ = 0.8) in 
both the test and the reference region (Materials and Meth- 
ods). The reference region is evolving neutrally, whereas in 
the test region, weak purifying selection is favoring C/G 
alleles over A/T alleles (fig. 2A and O or A/T alleles over 
C/G alleles (fig. 26 and D). Mutations C <-> G and A <-> T 
are selectively neutral in all four cases. 

Note that in this scenario, the HKY85:7i inference 
model uses the true mutational bias n c+G = 0.2 and 7t A+T 
= 0.8 in both the reference and test sequence. The 
HKY85:p inference model uses n for the reference se- 
quence, but for the test sequence, it assumes the resulting 
equilibrium frequencies from mutation and purifying 
selection, p. 

When selection acts in favor of C/G with y = — 1 (fig. 2A), 
the JC69 and HKY85:p models both infer rate acceleration 
(f7fo>1), whereas the HKY85:7t model infers conservation 
(f7fo<1). The inferred branch-length extensions under JC69 
and HKY85:p are time dependent and become larger with 
the time of divergence. Both, branch-length extension and 
time dependence, are more pronounced under the JC69 
model than under the HKY85:p model. 
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Fig. 2. — Performance of ML branch-length estimation in the presence of mutational biases and weak selective constraint using different neutral 
inference models. In all four evolutionary scenarios, the true mutation model is an HKY85 model specified by an equilibrium AT content n: A+T = 0.8 at 
neutrally evolving sites and a transition/transversion ratio of four. In (A) and (0, C/G bases are preferred over A/T bases at constrained sites. Purifying 
selection is thus acting in opposition to the mutationally preferred states. In (6) and (D), A/T bases are preferred over C/G bases; purifying selection is 
acting in unison with the mutational biases. ML branch-length estimates at constrained sites, t", and at neutral sites, fo, were calculated using three 
different neutral inference models: the Jukes-Cantor (JC69) model, the HKY85 model with its mutational biases estimated from the data (HKY85:p) and 
the HKY85 model using the true neutral mutation parameters (HKY85:jt). Each used the default value of four for the transition/transversion ratio. 
Inferred branch-length ratios, flt* 0 , are shown as a function of true divergence time f measured in units of the average number of substitutions per 
neutral site. Values f*/fo < 1 indicate sequence conservation, whereas f7fo > 1 indicates faster than neutral evolution. 



At first glance, it might seem surprising that the 
HKY85:7t model "correctly" infers conservation in figure 
2A. Given that it uses the correct neutral substitution 
model one might expect that, similar to figure \A, 
constrained sites would be inferred to evolve faster than 
neutral sites. The HKY85:7t model is indeed the true 
substitution model for neutral sites and will therefore 
infer the correct ML branch-length estimate f 0 at those 
sites. At constrained sites, however, it assumes the wrong 
equilibrium frequencies (n ^ p). In the test region, the 
HKY85:7t model will overestimate the equilibrium A/T 
frequency (7t A+T > p a +t) and the overall rate at which 
substitutions from C/G to A/T should occur. As a result, 
in the fashion of "two wrongs making it right," it infers 
branch-length reduction even though the rate of 
evolution is in fact higher at these sites than at the neutral 



sites. The HKY85:p model in contrast can be compared 
directly with the scenario of figure 1/\ because it uses 
the correct equilibrium frequencies in both the neutral 
and constrained cases. Consistent with the rate 
acceleration in figure 1A, the HKY85:p model infers 
branch-length extension in figure 2A. 

One of the consequences of these patterns is that 
purifying selection of the same strength but operating 
in different orientations relative to the mutational bias 
leads to different estimations of conservation. For in- 
stance, when comparing the results of figure 2A with 
figure 26, it can be seen that changing the favored nucle- 
otide pair from C/G to A/T (while maintaining the same 
mutational bias toward A/T) markedly alters the level 
of inferred conservation, whereas the actual strength 
of purifying selection is the same in the two scenarios. 
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The conservation as measured by the HKY85:p model is 
lowered by 20% to 45%, whereas there is an increase 
up to 10% in the conservation as measured by the 
HKY85:ti model. 

In figure 2Cand D, the strength of selection is increased 
to y = —2. The discrepancies in the inferred strength of con- 
servation between scenarios with different selection orien- 
tation but same selection strength do not vanish at this 
higher selection coefficient. This can be understood from 
figure \A when comparing the curves for rc A+T = 0.3 
and 7i A+T = 0.7 (changing the mutational bias from n A+J 
to 1 - 7t A+T is comparable to changing the orientation of 
selection while keeping the mutational bias constant). 
The shift in r/r 0 between the two curves is almost the same 
for y = -1 and -2. 

Regardless of whether branch-length reduction or 
extension is observed, estimates of branch-length ratios 
are generally time dependent in figure 2. In the extreme, 
this can lead to situations where conservation will be esti- 
mated for short divergence times, whereas at larger diver- 
gence times, constrained sequences will appear to have 
evolved faster than neutral, as is the case in figure 2C 
for the JC69 model. These effects are a function of the 
mutational bias and become weaker as the mutational bias 
is reduced (supplementary fig. S1, Supplementary Material 
online). The time dependence is a consequence of the fact 
that selection does not uniformly decrease all individual 
substitution rates. Substitutions C <-> G and A <-» T, for 
example, are still effectively neutral in the constrained 
sequence. This cannot be accounted for by a constant, sca- 
lar reduction in branch lengths. The resulting error in the 
multiple-hits correction is compounded as time increases 
and thus, in principle, should be less pronounced for more 
closely related species. 



Estimating Conservation in the Presence of BGC 
and Randomly Oriented Selection 

In the above analysis, we assumed that all sites in the ref- 
erence sequence are truly neutral and, in the case of the 
HKY85:p model, that we know the correct allele equilibrium 
frequencies at every site. Neither assumption is likely to hold 
in practice. Selection and selection-like forces, especially 
BGC, can operate in supposedly "neutral" regions. Estimat- 
ing a true neutral model from any sequence would thus 
prove difficult without a priori knowledge of the strength 
of this effect. The underlying assumption of the HKY85:p 
model, that we actually know the correct equilibrium fre- 
quencies p at every site of the test sequence, is also unre- 
alistic. Over a functional locus, preferred bases of selection 
will likely be in some jumble of orientations and different 
selective forces might be operating on the region and even 
on the same site. 



To investigate how such complications affect ML branch- 
length inference, we created different reference and test 
models that incorporate BGC as well as heterogeneous 
models of selection in the test sequence. The mutational 
bias is again assumed to be n A + T = 0.8 in the reference 
and test sequence. BGC is modeled as a selective preference 
for C/G alleles over A/Talleles, whereas changes G <-> C and 
A «-> T are not affected, similar to the selection scenario 
setup in figures 1 and 2. We assume BGC to operate in both 
the reference sequence and the test sequence with same 
strength ybgc- We investigate three scenarios: ybgc = {0, 
- 1 , -2}. In the test sequence, functional purifying selection 
is acting on top of BGC, whereas in the reference sequence, 
BGC is acting alone. We assume that at every site in the 
test sequence one base is selectively preferred over the 
three other bases. Which base is the preferred nucleotide 
at a given site is randomly chosen with all four states 
{A,C,G,T} having equal probability. Functional selection 
operates at all sites in the test sequence at the same 
strength, Yfunc- 

ML branch-length analysis is performed for the HKY85:p 
and HKY85:ti inference models. As before, the HKY85:ti 
model uses the true neutral substitution biases, n, without 
BGC or selection in both test and reference region. The 
HKY85:p model, in contrast, uses the expected nucleotide 
frequencies, p, of the sequences. In the reference sequence, 
p is the equilibrium nucleotide content resulting from 
mutational biases and BGC. In the test sequences, p incor- 
porates mutational bias, BGC, and functional selection 
averaged over the test sequence. 

Figure 3 shows the resulting conservation estimates. In 
the top two panels, the strength of functional selection is 
Yfunc = -1, whereas in the bottom two panels Yfunc = 
-2. The left panels show conservation estimates for the 
subset of sites where the preferred state is C and the right 
panels show the results for sites where A is preferred. Due to 
the symmetry in the specification of the mutation-selection 
models, conservation estimates are equivalent across A and 
T sites as well as across C and G sites. 

Without BGC (ybgc = 0), the HKY85:ti and the HKY85:p 
models behave similarly. Much like the HKY85:7i model from 
figure 2, they always infer conservation. The reason for this 
is that weak selection that is randomly oriented has little 
effect on the overall content bias (p A +T = 0-78 in the test 
sequence for that scenario), and thus p « n. As before, ML 
branch-length inference still suffers from asymmetries and 
time dependence. 

At the sites where C is the preferred state by selection (fig. 
3A and Q, increasing the strength of BGC from ybgc = -1 to 
Ybgc = -2 leads to more conservation. In this scenario, BGC 
and purifying selection are operating in the same direction, 
which can be interpreted as effective selection for C of 
strength ybgc in the reference sequence and of strength 
Yfunc + Ybgc in the test sequence. Although the difference 
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Fig. 3. — Performance of ML branch-length estimation in the presence of mutational biases, randomly oriented weak constraint, and BGC under 
the HKY85:p and the HKY85:it inference models. The mutational bias is always ir A+T = 0.8 with a transition/transversion ratio of four. (A) and (0 show 
the results where C is the preferred base, whereas (8) and (D) show the results where A is the preferred base. Different colors indicate different 
strengths of BGC, which uniformly favors C/G over A/T alleles in both the test and the reference sequence. 



in the effective strength of selection between the two is in- 
dependent of the strength of BGC, conservation estimates 
are not. This is a consequence of the fact that the rate of evo- 
lution is a concave function over the relevant range of effec- 
tive selection coefficients, as can be seen in figure 1A 
Similarly, at the sites where A is the preferred state by selection 
(fig. 3B and D), increasing the strength of BGC generally leads 
to less conservation. In this scenario, BGC and purifying se- 
lection are operating in opposite direction. Thus, there is ef- 
fectively less selection on the test sequence than the 
reference sequence. In both cases, the increase/decrease 
in conservation estimates due to BGC becomes more pro- 
found in the HKY85:7i inference model than in the HKY85:p 
model. 

If the orientation of functional selection to mutational bias 
is random, then, in the absence of BGC, all functional sites 
will indeed show sequence conservation regardless of the ori- 
entation of selection to mutation. The presence of BGC, how- 
ever, magnifies the asymmetries in the inferred amount of 
conservation between sites with different preferred bases 
with some sites appearing to evolve faster than neutral. These 



asymmetries arise even in the absence of strong mutational 
biases (supplementary fig. S2, Supplementary Material 
online). BGC could thus further reduce the power to detect 
functional regions on the basis of sequence conservation. 

A Practical Application: GERP RS-Scores on 
a Realistic Species Tree 

We have investigated the behavior of ML branch-length 
inference in an instructive generalized framework to eluci- 
date how the link between conservation and purifying 
selection can be misleading when selective forces are weak 
and mutations are biased. ML branch-length inference 
underlies many popular methods for conservation analysis 
such as GERP (Cooper et al. 2005) and phyloP (Pollard 
et al. 2010), raising the question to what extent such tools 
will be affected by these problems. 

In order to investigate the effects on an exemplary prac- 
tical application, we tested the performance of GERP on 
simulated multiple sequence alignments over a realistic 
species tree (Materials and Methods). The sequences in 
the test alignment were modeled to have evolved in 
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Fig. 4. — Performance of GERP on simulated sequence alignments over a realistic 32 mammalian species tree. The alignment sites were modeled to 
have evolved under mutational biases, randomly oriented weak constraint, and BGC. {A) Shows the mean RS-scores of all sites where C was the 
preferred state as a function of the strength of functional selection: y func . (B) Shows results for the sites where A was preferred state. The mutational 
bias was again it A+T = 0.8 with a transition/transversion ratio of four. Positive RS-scores indicate branch-length reduction as the number of substitutions 
is lower than expected under neutrality (4.74) — the equivalent of f*/fo<1. Negative RS-scores indicate more substitutions having occurred than 
expected — he equivalent of f*/fo> 1 . RS-scores were normalized such that the neutral class (yf unc = 0) has an RS-score of zero. Different colors indicate 
different strengths of BGC, which uniformly favors C/G over A/T alleles. 



a scenario with mutational biases, BGC, and weak purify- 
ing selection. Mutations were again biased toward A/Twith 
tca+t = 0.8, and BGC was acting uniformly over the 
genome, favoring C/G over A/T bases at strength y B Gc- 
In addition, purifying selection favored at every site a ran- 
domly selected base with the strength y func , analogous to 
the scenario in the previous section. 

To obtain its site-wise rejected substitution scores (RS- 
scores), GERP calculates the difference between the number 
of substitutions expected to have occurred if the site was 
evolving neutrally and the actual number of substitutions 
inferred from the alignment at each site. GERP needs to 
be provided with the correct neutral tree. Underlying its 
ML inference is an HKY85 mutation model; the transi- 
tion/transversion ratio needs to be specified by the user. 
Equilibrium nucleotide frequencies are estimated directly 
from the alignments (corresponding to our HKY85:p model 
from fig. 3). For our analysis, we provided GERP with the 
true tree that was used to generate the simulated sequence 
alignments and the correct transition/transversion ratio. 

Figure 4 shows the mean RS-scores at sites where puri- 
fying selection favors C (fig. 4A) and at those sites where 
it favors A (fig. 4B), as a function of the strength of purifying 
selection (median RS-scores are shown in supplementary 
fig. S3, Supplementary Material online). Due to symmetry, 
A and T sites as well as C and G sites are again equivalent. 
Results are shown for three different BGC scenarios: 
Tbgc = {0,-1,-2}. 

The number of expected substitutions per site under neu- 
trality and without BGC is 4.74 for our tree. According to 
figure 1, BGC of strength y BGC = -1 should then increase 
the average number of substitutions above the expectation of 
4.74 substitutions per site, leading to positive RS-scores in the 



neutral scenario (y func = 0). In the scenario with y BGC = -2, 
fewer substitutions should occur and RS-scores should 
be negative in the neutral scenario. Because we want to 
compare RS-scores between constrained and neutral sites, 
RS-scores were shifted such that the class without functional 
selection (yf unc = 0) always has a score of zero. The mag- 
nitude of the resulting shift can be estimated from the dif- 
ference between the limiting RS-score at large yf Unc and the 
neutral expectation of 4.74 without BGC (at functional 
selection strengths of y func < -6 one obtains full conser- 
vation at each site). 

As expected, GERP shows behavior similar to the theoret- 
ical HKY:p model in figure 3 at short timescales. In the 
absence of BGC, the sites where C is preferred appear to 
have evolved neutrally when functional selection is weaker 

than y func 1.5. Sites where A is the preferred state, on 

the other hand, are inferred constrained for all y func < 0. 
When BGC is acting, this pattern switches: conservation will 
be inferred at sites where C is weakly preferred, whereas 
sites where A is weakly preferred now appear to have 
evolved neutrally. For strong BGC (y BGC = -2), RS-scores 
at the sites where A is preferred even drop below zero 
for -2 < y func < 0, indicating a faster than neutral rate 
of evolution in that range. 

The complications emerging from the interplay of muta- 
tional biases, BGC, and weak functional selection thus also 
affect practical applications to infer sequence conservation 
from multiple sequence alignments, as exemplified here for 
the tool GERP. Depending on the orientation of functional 
selection, constrained sites will not always appear to be 
constrained and can even show less conservation than un- 
constrained sites. If conservation is inferred, the level of con- 
servation will be highly asymmetric depending on which 
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base is the preferred state. These complications become 
even more profound when the mutational biases are weaker 
(supplementary fig. S4, Supplementary Material online). In 
functional regions, we typically expect the orientation of 
selection to be in some jumble over the different nucleoti- 
des. When averaged across such loci they should then ap- 
pear less constrained than they actually are due to the lack 
of conservation at the sites where preferred bases and mu- 
tation oppose. As such, sitewise as well as regional con- 
servation estimates may not always perfectly reflect the 
presence of functional selection. 

Discussion 

Interactions between mutational biases, BGC, and weak 
selection can have intricate effects on sequence conserva- 
tion and its inference: the rate of evolution at constrained 
sites can actually be higher compared with neutrally evolv- 
ing sites, and conservation estimates will typically depend 
on the orientation of selection and vary over time. 

Integral to these complications is the presence of muta- 
tional biases. Recent studies point to the ubiquity of such 
mutational biases across a range of organisms. In the three 
classical genetic model organisms Drosophila melanogaster, 
Saccharomyces cerevisiae, and Arabidopsis thaliana, muta- 
tion accumulation studies revealed mutational biases push- 
ing their genomes toward high equilibrium A/T contents. In 
Drosophila, the mutational equilibrium was calculated to be 
67% A/T (Keightley et al. 2009); in yeast, the mutational 
bias should yield 74% A/T in equilibrium (Lynch et al. 
2008); and in Arabidopsis, the mutational bias should push 
the genome toward 85% A/T (Ossowski et al. 2010). Like- 
wise, the mutations in mitochondria of S. cerevisiae, Caeno- 
rhabditiseiegans, and D. melanogaster appear to be highly 
biased (Haag-Liautard et al. 2008; Montooth and Rand 
2008; Montooth et al. 2009). Mutations also seem to be 
generally biased toward A/T in prokaryotes (Hershberg 
and Petrov 2010; Hildebrand et al. 2010). 

In addition to requiring mutational biases, selection needs 
to be weak for the complications in conservation estimation 
to arise. In order for substantial parts of a genome to be af- 
fected, many sites would have to be evolving under such 
weak selective forces. Is there evidence that weak 
selection is indeed common? Evidence of abundant weak 
purifying selection has been given by numerous population 
genetic studies (Bustamante et al. 2002, 2005; Ohta 2002; 
Lu and Wu 2005; Comeron 2006; Lipatov et al. 2006; 
Eyre-Walker and Keightley 2007; Eory et al. 2010). Further- 
more, in genomic alignments, sites with intermediate GERP 
scores (those scores between neutrally evolving and totally 
conserved) are very common in constrained elements and 
coding sequences (Davydov et al. 2010; Goode et al. 
2010). According to figure 4, for a site to appear completely 
conserved across our tree does not require strong selection. 



In fact, when y = -6, almost all sites should appear 
completely conserved. Given the preponderance of sites 
with intermediate RS-scores, either constraint is weak at 
many sites or it varies over the tree such that the inferred 
constraint appears weak. If the former, then large parts 
of the genomes are indeed evolving in the parameter regime 
considered by our study. 

There is also indirect evidence that weak selective forces 
are widespread across the genome in the form of discrep- 
ancies between mutational biases and genomic nucleotide 
contents. For instance, the A/T content of the yeast genome 
is only 60% as compared with its mutational equilibrium of 
74% A/T (Lynch et al. 2008). In Arabidopsis, while the 
mutational bias should drive the genome toward 85% 
A/T, the actual intronic/degenerate codon A/T content 
bias is only 65/68% (Ossowski et al. 2010). If these differ- 
ences are not assumed to reflect varying mutational biases 
over time, selective forces (i.e., either natural selection or 
BGC) need to be causing biases in fixation probabilities that 
partly compensate for the mutational biases. To yield the 
observed genomic content biases, the strength of such 
forces would correspond to effective selection of strength 
y ~ -0.7 for fixation of A/T versus C/G in yeast and 
y 1 in Arabidopsis. For both cases, these points are al- 
most exactly where r/r 0 has its maximum elevation for the 
respective mutational biases (fig. 1,4). Many regions of the 
yeast and Arabidopsis genomes might then in fact be evolv- 
ing more rapidly than they would if fixation probabilities 
were solely determined by random genetic drift. 

In bacteria, the differences between mutational biases 
and genomic nucleotide composition are typically even 
more profound. Although recent evidence suggests that 
bacterial mutations is universally biased toward A/T (Hersh- 
berg and Petrov 2010; Hildebrand et al. 2010), genomic 
nucleotide contents can vary widely from —20 to —80% 
A/T. These examples illustrate various situations where weak 
selective forces seem to be acting systematically in opposi- 
tion to existing mutational biases. 

It thus seems that both necessary ingredients for the com- 
plications in conservation estimates to arise, mutational 
biases and weak selective forces, are likely to be common 
in nature. Consequently, a substantial fraction of genomic 
sites should suffer from misleading conservation estimates. 
What are the implications for conservation analysis? 

Conservation clearly remains a consistent and useful 
measure for evolutionary constraint where selection is suf- 
ficiently strong and also for weak constraint if mutations 
are unbiased. After all, comparative genomics approaches 
based on sequence conservation have proven extremely suc- 
cessful in annotating functional regions. Regions evolving 
under weak constraint, however, could often be missed 
by current approaches as they will not always show the 
expected signature of conservation. In particular, sites 
where selection and mutational biases oppose might 
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actually be inferred to have evolved under positive selection. 
Such effects should generally reduce the power of ap- 
proaches that identify functional regions from conservation 
signatures. This could be a contributing factor to why half of 
the functional regions characterized by the ENCODE project 
(Birney et al. 2007) do not show conservation signatures 
(Pheasant and Mattick 2007). 

In addition, estimated conservation scores are difficult to 
relate to the effective strength of selection operating on 
a site. As we have shown, conservation estimates are highly 
asymmetric with regard to the preferred base and typically 
vary over time. Purifying selection of the same strength can 
thus give rise to very different conservation estimates 
depending on the particular scenario. 

Our analysis also highlights a notorious general problem 
of conservation analysis: the assumption of having a truly 
neutrally evolving reference sequence for comparison with 
the test region. If BGC or selection are the underlying causes 
for the observed discrepancies between mutational biases 
and genomic nucleotide contents, then these processes 
would likely be acting throughout the genome, making it 
very difficult to find truly neutrally evolving regions. Our re- 
sults indicate that this should generally exaggerate the com- 
plications in conservation analysis. Moreover, BGC is likely to 
show regional variation along a genome. For example, it has 
been suggested that some fraction of the so-called human- 
accelerated regions, which show an increase of substitu- 
tions on the human lineage when compared with their 
homologous regions across a phylogeny (Pollard et al. 
2006b), are in fact constrained sequences to which recently 
a BGC hot spot has moved (Pollard et al. 2006a; Galtier and 
Duret2007; Berglundetal. 2009). Variation inthe efficacy of 
BGC along genomes will lead to particular sequence regions 
evolving faster or slower than others, further obfuscating 
conservation estimates. 

The limitations of conservation analysis presented here 
spotlight the need for more accurate inference methods 
in comparative genomics in order to also capture regions 
evolving under weak constraint. Substitution models more 
reflective of the actual substitution processes seem essential 
to these improvements. Such substitution models should 
disentangle the actions of mutation and selection, incorpo- 
rating the true mutational biases and models of selection 
that explicitly account for fixation probabilities. Constraint 
can then be inferred directly by estimating the parameters 
of the mutation-selection model without the poten- 
tially misleading estimation of conservation with respect 
to a reference sequence. 

Recent advances in experimental techniques render 
possible an unbiased estimation of mutation biases. With 
the advent of new sequencing technologies, resequenc- 
ing large numbers of genomes has become a practical 
endeavor (Durbin et al. 2010). This opens up the possibil- 
ity for whole-genome sequencing of mutation accumula- 



tion lines, as well as the analysis of deep polymorphism 
data from natural populations. Both approaches should 
allow for more accurate estimates of the mutational spec- 
trum (Lynch et al. 2008; Messer 2009; Ossowski et al. 
2010). Factors such as neighbor-dependent mutation 
rates or transcription-associated mutational asymmetry 
ideally should also be considered. 

Substitution models that explicitly incorporate the action 
of selection do also already exist (Halpern and Bruno 1998; 
Moses et al. 2004; Doniger and Fay 2007; Yang and Nielsen 
2008; Berglund et al. 2009; Rodrigue et al. 2010). The first 
attempt applied to codon substitution models (Halpern 
and Bruno 1998) suffered from a very high number of pa- 
rameters as the fitness for each amino acid at every position 
in the protein had to be estimated. Recent work has ame- 
liorated this issue by effectively reducing the number of 
free parameters while retaining some of the site-wise flex- 
ibility (Rodrigue et al. 201 0). Other applications of mutation- 
selection models have focused on more tractable scenarios 
with intrinsically fewer parameters such as codon bias 
and BGC (Yang and Nielsen 2008; Berglund et al. 2009). 
Models for transcription factor binding sites have taken 
the concept one step further by additionally using 
functional information from position weight matrices as 
fitness parameters (Moses et al. 2004; Doniger and Fay 
2007). For the scenarios to which they have been applied, 
mutation-selection models have been shown to generally 
outperform simple neutral models (Yang and Nielsen 
2008; Rodrigue et al. 2010). The results presented in this 
paper suggest some of the reasons for this increase in 
performance. 

With the seeming pervasiveness of strong mutational 
biases, weak selective constraint, and BGC, the substitu- 
tion dynamics of any given genomic locus is likely to be 
poorly captured by neutral models. The ingredients for 
better inference models are 2-fold: 1) a precise measure- 
ment of the true underlying mutational biases and 2) the 
disentanglement of mutation and selection together with 
a more accurate modeling of the specific nature of the se- 
lective forces. Using a methodology with such ingredients 
built-in, constraint can be inferred directly, circumventing 
the complications arising from indirect inference via con- 
servation. Much work is needed to independently mea- 
sure mutational biases in many organisms and to 
improve the efficiency and fidelity of the selection 
models, while controlling model complexity and avoiding 
overfitting. Given the advantages, however, the design 
and implementation of such explicit mutation-selection 
models is highly desirable. 

Supplementary Material 

Supplementary figures S1-S4 are available at Genome Biol- 
ogy and Evolution online (http://gbe.oxfordjournals.org/). 
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